图形神经网络(GNN)在许多基于图的应用程序中取得了巨大成功。但是,巨大的尺寸和高稀疏度的图表阻碍了其在工业场景下的应用。尽管为大规模图提出了一些可扩展的GNN,但它们为每个节点采用固定的$ k $ hop邻域,因此在稀疏区域内采用大型繁殖深度时面临过度光滑的问题。为了解决上述问题,我们提出了一种新的GNN体系结构 - 图形注意多层感知器(GAMLP),该架构可以捕获不同图形知识范围之间的基本相关性。我们已经与天使平台部署了GAMLP,并进一步评估了现实世界数据集和大规模工业数据集的GAMLP。这14个图数据集的广泛实验表明,GAMLP在享有高可扩展性和效率的同时,达到了最先进的性能。具体来说,在我们的大规模腾讯视频数据集上的预测准确性方面,它的表现优于1.3 \%,同时达到了高达$ 50 \ times $ triending的速度。此外,它在开放图基准的最大同质和异质图(即OGBN-PAPERS100M和OGBN-MAG)的排行榜上排名第一。
translated by 谷歌翻译
K-Core Deconnosition是一个常用的指标来分析图形结构或研究节点在复杂图中的相对重要性。近年来,图表的规模迅速增长,特别是在工业环境中。例如,我们的工业伙伴以数十亿用户运行流行的社交应用程序,并且能够收集丰富的用户数据。因此,对大型图形的k核分解应用于学术界和行业的越来越多的关注。处理大图的简单但有效的方法是在分布式设置中训练它们,并且还提出了一些分布式k核分解算法。尽管他们有效性,我们在实验和理论上观察到这些算法消耗了太多资源,并在超大型图表上变得不稳定,特别是当给定的资源有限时。在本文中,我们处理那些超大型图形,并在分布式K核分解算法的顶部提出了分行和征服策略。我们在三个大图中评估我们的方法。实验结果表明,资源的消耗可以显着降低,大规模图的计算比现有方法更稳定。例如,分布式K-Core分解算法可以缩放到具有1360亿边缘的大图,而不会与我们的分行和征服技术丢失正确性。
translated by 谷歌翻译
基于决策攻击对现实世界应用程序构成严重威胁,因为它将目标模型视为黑盒子,并且仅访问硬预测标签。最近已经努力减少查询的数量;然而,现有的基于决策攻击仍需要数千个疑问以产生良好的质量的对抗性示例。在这项工作中,我们发现一个良性样本,当前和下一个逆势示例可以自然地构建子空间中的三角形以获得任何迭代攻击。基于诸如SINES的规律,我们提出了一种新颖的三角形攻击(TA)来通过利用较长侧总是与任何三角形的较大角度相对的几何信息来优化扰动。然而,直接在输入图像上施加这样的信息是无效的,因为它不能彻底探索高维空间中输入样本的邻域。为了解决这个问题,TA优化低频空间中的扰动,以获得由于此类几何特性的一般性而有效减少。对ImageNet DataSet的广泛评估表明,TA在1,000个查询中实现了更高的攻击成功率,并且需要更少的查询,以在各种扰动预算下实现相同的攻击成功率,而不是现有的基于决策攻击。具有如此高的效率,我们进一步展示了TA在真实世界API上的适用性,即腾讯云API。
translated by 谷歌翻译
Graph神经网络(GNN)最近在许多基于图的应用程序中都实现了最先进的性能。尽管具有很高的表现力,但他们通常需要在多个培训时期进行昂贵的递归邻里扩展,并面临可伸缩性问题。此外,它们中的大多数是不灵活的,因为它们仅限于固定跳跃社区,并且对不同节点的实际接受场需求不敏感。我们通过引入可扩展且灵活的图表多层感知器(GAMLP)来规避这些限制。随着非线性转化和特征传播的分离,GAMLP通过以预先计算的方式执行传播程序来显着提高可伸缩性和效率。有了三个原则的接受场注意力,GAMLP中的每个节点都具有灵活性和适应性,以利用接收场的不同尺寸的传播特征。我们对三个大型开放图基准(例如OGBN-PAPERS100M,OGBN产品和OGBN-MAG)进行了广泛的评估,这表明GAMLP不仅可以实现前面的性能,而且还提供了较高的可扩展性和效率。
translated by 谷歌翻译
我们将自然语言处理模型的脆弱性归因于以下事实:类似的输入转换为嵌入空间中不同的表示形式,导致输出不一致,我们提出了一种新颖的强大训练方法,称为快速三胞胎度量度量学习(FTML)。具体而言,我们认为原始样本应具有相似的表示及其对手对应物,并将其代表与其他样品区分开,以提高鲁棒性。为此,我们将三胞胎度量学习采用标准培训中,以将单词更接近其正样本(即同义词),并在嵌入空间中推出其负面样本(即非综合样品)。广泛的实验表明,FTML可以显着促进模型的鲁棒性,以针对各种高级对抗攻击,同时保持对原始样品的竞争性分类精度。此外,我们的方法是有效的,因为它只需要调整嵌入方式,并且在标准培训上引入了很少的开销。我们的工作显示出通过稳健的单词嵌入来改善文本鲁棒性的巨大潜力。
translated by 谷歌翻译
一项工作表明,自然文本处理模型容易受到对抗示例的影响。相应地,提出了各种辩护方法来减轻文本对抗性示例的威胁,例如对抗性训练,输入转换,检测等。在这项工作中,我们将基于同义词替代的基于同义词的文本对抗性攻击作为特定的单词序列对待优化过程替代品,每个单词相互影响其他单词。我们确定我们可以通过随机替换一个单词的同义词来消除这种相互作用并消除对抗性扰动。基于此观察,我们提出了一种新型的文本对抗示例检测方法,称为随机替代和投票(RS&V),该方法通过累积通过与同步输入文本中随机替换单词生成的k样品的liogits来投票标签。提出的RS&V通常适用于任何现有的神经网络,而无需修改体系结构或额外的培训,并且先前的工作使分类网络本身更强大是正交的。在三个基准数据集上进行的经验评估表明,与现有检测方法相比,我们的RS&V可以更成功地检测到文本对抗示例,同时保持良性样本上的高分类精度。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译